AITopics | stochastic ball model

Collaborating Authors

stochastic ball model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BalLOT: Balanced $k$-means clustering with optimal transport

Luo, Wenyan, Mixon, Dustin G.

arXiv.org Machine LearningDec-8-2025

We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.

algorithm, ballot, stochastic ball model, (15 more...)

arXiv.org Machine Learning

2512.05926

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre:

Workflow (0.66)
Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.83)

Add feedback

A Geometric Approach to $k$-means

Hong, Jiazhen, Qian, Wei, Chen, Yudong, Zhang, Yuqian

arXiv.org Artificial IntelligenceOct-24-2025

\kmeans clustering is a fundamental problem in many scientific and engineering domains. The optimization problem associated with \kmeans clustering is nonconvex, for which standard algorithms are only guaranteed to find a local optimum. Leveraging the hidden structure of local solutions, we propose a general algorithmic framework for escaping undesirable local solutions and recovering the global solution or the ground truth clustering. This framework consists of iteratively alternating between two steps: (i) detect mis-specified clusters in a local solution, and (ii) improve the local solution by non-local operations. We discuss specific implementation of these steps, and elucidate how the proposed framework unifies many existing variants of \kmeans algorithms through a geometric perspective. We also present two natural variants of the proposed framework, where the initial number of clusters may be over- or under-specified. We provide theoretical justifications and extensive experiments to demonstrate the efficacy of the proposed approach.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TKDE.2025.3616858

2201.04822

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Jigsaw Game: Federated Clustering

Xu, Jinxuan, Chen, Hong-You, Chao, Wei-Lun, Zhang, Yuqian

arXiv.org Artificial IntelligenceJul-17-2024

Federated learning has recently garnered significant attention, especially within the domain of supervised learning. However, despite the abundance of unlabeled data on end-users, unsupervised learning problems such as clustering in the federated setting remain underexplored. In this paper, we investigate the federated clustering problem, with a focus on federated k-means. We outline the challenge posed by its non-convex objective and data heterogeneity in the federated framework. To tackle these challenges, we adopt a new perspective by studying the structures of local solutions in k-means and propose a one-shot algorithm called FeCA (Federated Centroid Aggregation). FeCA adaptively refines local solutions on clients, then aggregates these refined solutions to recover the global solution of the entire dataset in a single round. We empirically demonstrate the robustness of FeCA under various federated scenarios on both synthetic and real-world data. Additionally, we extend FeCA to representation learning and present DeepFeCA, which combines Deep-Cluster and FeCA for unsupervised feature learning in the federated setting.

centroid, local solution, true center, (15 more...)

arXiv.org Artificial Intelligence

2407.12764

Country:

North America > United States > Ohio (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Structures of Spurious Local Minima in $k$-means

Qian, Wei, Zhang, Yuqian, Chen, Yudong

arXiv.org Machine LearningFeb-21-2020

$k$-means clustering is a fundamental problem in unsupervised learning. The problem concerns finding a partition of the data points into $k$ clusters such that the within-cluster variation is minimized. Despite its importance and wide applicability, a theoretical understanding of the $k$-means problem has not been completely satisfactory. Existing algorithms with theoretical performance guarantees often rely on sophisticated (sometimes artificial) algorithmic techniques and restricted assumptions on the data. The main challenge lies in the non-convex nature of the problem; in particular, there exist additional local solutions other than the global optimum. Moreover, the simplest and most popular algorithm for $k$-means, namely Lloyd's algorithm, generally converges to such spurious local solutions both in theory and in practice. In this paper, we approach the $k$-means problem from a new perspective, by investigating the structures of these spurious local solutions under a probabilistic generative model with $k$ ground truth clusters. As soon as $k=3$, spurious local minima provably exist, even for well-separated and balanced clusters. One such local minimum puts two centers at one true cluster, and the third center in the middle of the other two true clusters. For general $k$, one local minimum puts multiple centers at a true cluster, and one center in the middle of multiple true clusters. Perhaps surprisingly, we prove that this is essentially the only type of spurious local minima under a separation condition. Our results pertain to the $k$-means formulation for mixtures of Gaussians or bounded distributions. Our theoretical results corroborate existing empirical observations and provide justification for several improved algorithms for $k$-means clustering.

equation, local minimum, voronoi, (15 more...)

arXiv.org Machine Learning

2002.06694

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Hidden Integrality of SDP Relaxation for Sub-Gaussian Mixture Models

Fei, Yingjie, Chen, Yudong

arXiv.org Machine LearningMar-17-2018

We consider the problem of estimating the discrete clustering structures under Sub-Gaussian Mixture Models. Our main results establish a hidden integrality property of a semidefinite programming (SDP) relaxation for this problem: while the optimal solutions to the SDP are not integer-valued in general, their estimation errors can be upper bounded in terms of the error of an idealized integer program. The error of the integer program, and hence that of the SDP, are further shown to decay exponentially in the signal-to-noise ratio. To the best of our knowledge, this is the first exponentially decaying error bound for convex relaxations of mixture models, and our results reveal the "global-to-local" mechanism that drives the performance of the SDP relaxation. A corollary of our results shows that in certain regimes the SDP solutions are in fact integral and exact, improving on existing exact recovery results for convex relaxations. More generally, our results establish sufficient conditions for the SDP to correctly recover the cluster memberships of $(1-\delta)$ fraction of the points for any $\delta\in(0,1)$. As a special case, we show that under the $d$-dimensional Stochastic Ball Model, SDP achieves non-trivial (sometimes exact) recovery when the center separation is as small as $\sqrt{1/d}$, which complements previous exact recovery results that require constant separation.

artificial intelligence, machine learning, relaxation, (17 more...)

arXiv.org Machine Learning

1803.0651

Country:

North America > United States (0.14)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Data Science (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

On the tightness of an SDP relaxation of k-means

Iguchi, Takayuki, Mixon, Dustin G., Peterson, Jesse, Villar, Soledad

arXiv.org Machine LearningMay-18-2015

Recently, Awasthi et al. introduced an SDP relaxation of the $k$-means problem in $\mathbb R^m$. In this work, we consider a random model for the data points in which $k$ balls of unit radius are deterministically distributed throughout $\mathbb R^m$, and then in each ball, $n$ points are drawn according to a common rotationally invariant probability distribution. For any fixed ball configuration and probability distribution, we prove that the SDP relaxation of the $k$-means problem exactly recovers these planted clusters with probability $1-e^{-\Omega(n)}$ provided the distance between any two of the ball centers is $>2+\epsilon$, where $\epsilon$ is an explicit function of the configuration of the ball centers, and can be arbitrarily small when $m$ is large.

artificial intelligence, machine learning, relaxation, (17 more...)

arXiv.org Machine Learning

1505.04778

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.64)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.85)

Add feedback